Generating multiple-accent pronunciations for TTS using joint sequence model interpolation

نویسندگان

BalaKrishna Kolluru

Vincent Wan

Javier Latorre

Kayoko Yanagisawa

Mark J. F. Gales

چکیده

Standard grapheme-to-phoneme (G2P) systems are trained using a homogeneous lexicon, for example one associated with a particular accent. In practice, a synthesis system may be required to handle multiple accents. Furthermore, a speaker rarely has a pure accent; accents vary continuously within and between regions of a country. Generating phonetic sequences for each accent is possible, but combining them to yield a single synthesis pronunciation is highly challenging. To address this problem, this paper considers a space of accents. The bases for these spaces are defined by statistical G2P models in the form of graphone models. A linear combination of these models define the accent space. By selecting a point in this continuous space, it is possible to specify the accent for an individual speaker. The performance of this approach is evaluated using an accent space defined by American, Scottish and British English. By moving around the accent space, it is shown that it is possible to synthesize speech from all these accents as well as a range of intermediate points.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Predicting Pronunciations with Syllabification and Stress with Recurrent Neural Networks

Word pronunciations, consisting of phoneme sequences and the associated syllabification and stress patterns, are vital for both speech recognition and text-to-speech (TTS) systems. For speech recognition phoneme sequences for words may be learned from audio data. We train recurrent neural network (RNN) based models to predict the syllabification and stress pattern for such pronunciations making...

متن کامل

Intonation modeling for TTS using a joint extraction and prediction approach

This paper presents a joint extraction and prediction framework for intonation modeling. The intonation model is based on a superpositional approach using Bézier curves. The components are attached to minor phrase and accent group. A greedy algorithm performs succesive partitions on training data using linguistic information. The parameters related to each partition are obtained using a global ...

متن کامل

Automatic rule generation for linguistic features analysis using inductive learning technique: linguistic features analysis in TOS drive TTS system

The linguistic features analysis for input text plays an important role in achieving natural prosodic control in text-to-speech (TTS) systems. In a conventional scheme, experts refine suspicious if-then rules and change the tree structure manually to obtain correct analysis results when input texts that have been analyzed incorrectly. However, altering the tree structure drastically is difficul...

متن کامل

Data-driven phonetic comparison and conversion between south african, british and american English pronunciations

We analyse pronunciations in American, British and South African English pronunciation dictionaries. Three analyses are perfomed. First the accuracy is determined with which decision tree based grapheme-to-phoneme (G2P) conversion can be applied to each accent. It is found that there is little difference between the accents in this regard. Secondly, pronunciations are compared by performing pai...

متن کامل

Comparing direct G2P with G2P followed by accent conversion when determining pronunciations for South African English

It has been shown that techniques known as grapheme-and-phoneme-to-phoneme (GP2P) conversion can be used to derive pronunciations in a poorly-resourced accent, such as South African English, using available pronunciations in better-resourced accents of the same language, such as British and American English. However if the pronunciation is not available in either accent, it must be obtained usi...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

Generating multiple-accent pronunciations for TTS using joint sequence model interpolation

نویسندگان

چکیده

منابع مشابه

Predicting Pronunciations with Syllabification and Stress with Recurrent Neural Networks

Intonation modeling for TTS using a joint extraction and prediction approach

Automatic rule generation for linguistic features analysis using inductive learning technique: linguistic features analysis in TOS drive TTS system

Data-driven phonetic comparison and conversion between south african, british and american English pronunciations

Comparing direct G2P with G2P followed by accent conversion when determining pronunciations for South African English

عنوان ژورنال:

اشتراک گذاری